Transformer induced enhanced feature engineering for contextual similarity detection in text

نویسندگان

چکیده

Availability of large data storage systems has resulted in digitization information. Question and answering communities like Quora stack overflow take advantage such to provide information users. However, as the amount stored gets larger, it becomes difficult keep track existing information, especially duplication. This work presents a similarity detection technique that can be used identify levels textual based on context which was provided. transformer contextual (TCSD), uses combination bidirectional encoder representations from transformers (BERT) metrics derive features data. The derived are train ensemble model for detection. Experiments were performed using question set. Results comparisons indicate proposed exhibits with an accuracy 92.5%, representing high efficiency.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Contextual feature selection for text classification

We present a simple approach for the classification of ‘‘noisy’’ documents using bigrams and named entities. The approach combines conventional feature selection with a contextual approach to filter out passages around selected features. Originally designed for call for tender documents, the method can be useful for other web collections that also contain non-topical contents. Experiments are c...

متن کامل

Feature Engineering for Text Classification

Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our hypothesis that they could improve the perfor...

متن کامل

Contextual Anomaly Detection in Text Data

We propose using side information to further inform anomaly detection algorithms of the semantic context of the text data they are analyzing, thereby considering both divergence from the statistical pattern seen in particular datasets and divergence seen from more general semantic expectations. Computational experiments show that our algorithm performs as expected on data that reflect real-worl...

متن کامل

Similarity Guided Feature Labeling for Lesion Detection

The performance of automatic lesion detection is often affected by the intra- and inter-subject feature variations of lesions and normal anatomical structures. In this work, we propose a similarity-guided sparse representation method for image patch labeling, with three aspects of similarity information modeling, to reduce the chance that the best reconstruction of a feature vector does not pro...

متن کامل

Features Based Text Similarity Detection

As the Internet help us cross cultural border by providing different information, plagiarism issue is bound to arise. As a result, plagiarism detection becomes more demanding in overcoming this issue. Different plagiarism detection tools have been developed based on various detection techniques. Nowadays, fingerprint matching technique plays an important role in those detection tools. However, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bulletin of Electrical Engineering and Informatics

سال: 2022

ISSN: ['2302-9285']

DOI: https://doi.org/10.11591/eei.v11i4.3284